Taking on new challenges in multi-word unit processing for machine translation
نویسندگان
چکیده
This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar approach with OpenLogos ontology and semantico-syntactic rules. The paper also discusses the importance of a qualitative evaluation metrics to correctly evaluate the performance of machine translation engines with regards to multi-word units.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملMixed up with Machine Translation: Multi-word Units Disambiguation Challenge
With the rapid evolution of the Internet, translation has become part of the daily life of ordinary users, not only of professional translators. Machine translation has evolved along with different types of computer-assisted translation tools. Qualitative progress has been made in the field of machine translation, but not all problems have been solved. One problem in particular, namely the poor...
متن کاملEnglish-Latvian Toponym Processing: Translation Strategies and Linguistic Patterns
The paper presents a study of a challenging task in machine translation and crosslanguage information retrieval – translation of toponyms. Due to their linguistic and extra-linguistic nature, toponyms deserve a special treatment. The overall translation process includes two stages of processing: dictionary-based and out-ofvocabulary toponym translation. The latter is divided into three steps: s...
متن کاملTowards producing bilingual lexica from monolingual corpora
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...
متن کامل